Integrative Forecasting and Analysis of Stock
Price Using Neural Network and ARIMA Model
Jin "Max" Li
Outline
1. Executive Summary
2. Background
3. Methodology
4. Experiments and Results
5. Conclusion
Section 1
Executive Summary
Executive Summary
Summary of Methodologies
-Data Collection
-Data Cleaning
-Data Visulazation
-Prediction Analytics with Deep Learning models
Summary of experiments and results
-Predictive Analytics result
Section 2
Background
Background
Investors in finance and risk management are interested in forecasting direct investment
returns from financial assets, including stocks, bonds, and commodities. Among all, the stock
index is one of the essential indicators for investors. Investors rely heavily on the stock index to
make optimal decisions to minimize loss and maximize their return. !
However, precise forecasting of the stock index has been exacting since the financial market is
inherent with its non-stationary nature in time series analysis. Besides, factors lying on the macro
level, such as political and public events, company performance, consumer behavior, and
national well-being, can significantly aect the stock price. Once the listed factors become public
knowledge, the stock market will adjust in response, making the prediction seem impossible. !
Since the market is not ecient, we can still predict future patterns based on historical market
fluctuations.!
Background
In the financial market analysis and predictions, studies have concerned short-term stock
price forecasting on a scale of days to months and long-term forecasting on a scale of years.
Both short-term and long-term predictions are valuable in decision-making. Short-term
forecasting can capture the details of changing stock prices, while long-term forecasting
reflects the market trend under specific changes and events in a macro-level concern. Therefore,
various investors are concerned about the short-term and long-term performance of the stock
market. Thus, combining short-term and long-term forecasting makes summarizing financial
patterns more eective and comprehensive. !
However, most studies focus on implementing forecasting models without regard to financial
forecasting as an integrated subject. The short-term and long-term forecasting researches are
usually done separately. This project proposes an integrated short-term and long-term
forecasting method to solve the emerging problem.
Section 3
Methodology
Methodology
Data collection and cleaning
We used the yfinance library to collect the data. The daily Nasdaq-100 Index data and monthly
Nasdaq Composite Index, which contain a series of data, including opening price, highest
price, lowest price, closing price, volume, dividends and stock splits.
Further column calculations can be
performed if we want to get the percentage
change data
Make sure to drop Nan for holidays etc.
We are not using these two columns, so we
have to create a new data frame to store
values without them
Methodology
Data visualization
We used the Matplotlib library to visualize the raw data.
Make sure that if we are plotting the graph with the whole dataset, the
graph will be plotted separately by columns.
Methodology
The autoregressive integrated moving average (ARIMA) model is proposed for long-term
data to fit the stock price pattern and market trend. For short-term patterns, we use a hybrid
neural network model of Convolutional Neural Network (CNN) and Long Short-Term Memory
(LSTM). The proposed approach is detailed in Figure 1.!
Figure 1. Process Summary.
Methodology
Forecasting of the daily stock price using neural network
In order to accurately predict the cyclical pattern and capture the short-term features, we
propose a neural network approach built on CNN-BiLSTM to forecast the next-day stock closing
price. The suggested method incorporates CNN and LSTM. The CNN layer can extract the local
features of the stock time-series data, whereas the BiLSTM (forward LSTM layer and reverse
LSTM layer) can find connections from the feedback of stock time-series data. Finally, the dense
layer can fully receive outputs from the preceding BiLSTM layer for generating the final output.
The model setup is given in Figure 2.!
Methodology
CNN-BiLSTM Model
Figure 2. CNN-BiLSTM model setup diagram.
•••
•••
•••
•••
•••
Time
Dense
Layer
•••
•••
•••
•••
•••
•••
Output
Convolutional Layer
Input Layer
Dense Layer
Pooling Layer
CNN Layers
h1
h2
h3
hn
Methodology
CNN-BiLSTM Model
Import Tensorflow, Keras library
Setup layers
CNN layers
Pooling layer
BiLSTM layer
More layers can be added according
to research interest
Dense layer
Methodology
Forecasting of the monthly stock price using hybrid ARIMA
As the volume of the financial data become extensive, we need a colossal amount of data to
perform training, which does not fit the most satisfactory conditions for the neural network.
Generally, the long-term data reveal the financial market trend, and it is considered to be more
linear. Therefore, we propose using a hybrid ARIMA model to forecast the monthly stock price
with the same dataset.!
Methodology
Forecasting of the monthly stock price using hybrid ARIMA
Dataset:
Dense layer
Standardization
ARIMA
LSTM
Output
Residual of ARIMA
Figure 3. CNN-BiLSTM model setup diagram.
Methodology
ARIMA-LSTM Model
Import ARIMA model from Statsmodels
Import LSTM model from Tensorflow
Import Evaluation metrics
Find the optimized p, d, q for the ARIMA
model
Import libraries
Optimized the ARIMA model
Methodology
ARIMA-LSTM Model
Get the ARIMA forecast
Calculate the residual of the
prediction and use the
column as the input of LSTM
Setup ARIMA
Get the result
Methodology
ARIMA-LSTM Model
We need to reshape
the ARIMA output as
to fit in the LSTM
input data format
Data transform
Setup LSTM layer
Section 4
Experiments and Results
Experiments and Results
Parameters (CNN-BiLSTM)
Parameters
Value
Convolutional layer filters
64
Convolutional layer kernel size
1
Convolutional layer padding
Same
Convolutional layer activation function
ReLU
Max pooling layer pool size
1
BiLSTM layer units
40
Table 1. Parameter specification of the proposed CNN-BiLSTM model.
Models
Details
LSTM1
LSTM layer with 100 units
LSTM2
LSTM layer with 200 units
BiLSTM1
BiLSTM layer with 100 units
BiLSTM2
BiLSTM layer with 200 units
CNN-LSTM
Convolutional layer with 64 filters!
Max pooling layer with size 1!
LSTM layer with 100 units
CNN-BiLSTM
Convolutional layer with 64 filters!
Max pooling layer with size 1!
BiLSTM layer with 40 units
Table 2. Parameter specification of all neural network models.
Experiments and Results
Parameters (ARIMA-LSTM)
Table 3. Parameters specification of the proposed ARIMA-LSTM model.
Parameters
Value
Order of the autoregressive: p
0
Degree of dierencing: d
4
Order of the moving average:
5
LSTM layer units
128
Experiments and Results
Evaluation Criteria
Mean absolute error (MAE)
MAE =1
n
n
i=1
|yi
yi|
Root mean square error (RMSE)
RMSE =1
n
n
i=1
(yi
yi)2
R-square (R2)
R2= 1 SSRES
SSTOT
= 1 n
i=1 (yi
yi)2
n
i=1 (yi¯yi)2
Experiments and Results
Experiment results (CNN-BiLSTM)
Table 2. Parameter specification of all neural network models.
The orange curve in each model is the forecast result while the blue curve represents the
actual stock price. The x-axis denotes the time, and the y-axis denotes the stock price in United
States Dollars. The degree of fitting of the broken-line graphs between actual observation values
and predicted values are ranked from low to high. More specifically, the LSTM1 model has
achieved the lowest degree of fitting of the broken-line graphs between actual observation values
and predicted values, then BiLSTM2, LSTM2, BiLSTM1, CNN-LSTM, and CNN-BiLSTM. The
CNN-BiLSTM model showed the predicted values almost perfectly fitting with the actual
observations. !
Experiments and Results
Experiment results (CNN-BiLSTM)
Experiments and Results
Experiment results (CNN-BiLSTM)
Models
MAE
RMSE
R2
LSTM1
211.158
253.405
0.96749
BiLSTM2
166.655
204.117
0.97890
LSTM2
162.994
199.605
0.97983
BiLSTM1
159.250
196.269
0.98049
CNN-LSTM
130.299
174.116
0.98465
CNN-BiLSTM
129,252
173.724
0.98472
Table 4. Comparison of the evaluation criteria of neural network models.
Experiments and Results
Experiment results (ARIMA-LSTM)
The orange curve in each model is the forecast result, while the blue curve represents the
actual stock price. The x-axis denotes the time, and the y-axis denotes the stock price in United
States Dollars. From the degree of fitting of the broken-line graphs between actual observation
values and predicted values, the optimal forecast was achieved by using the ARIMA-LSTM
model. The degree of fitting in the graph displayed that the ARIMA model had failed in capturing
nonlinear variations of the time series. Moreover, the LSTM model was used only for monthly
stock closing price forecasting, and the neural network model also failed to perform an accurate
forecast with a small amount of data, despite the discoverable trend displayed in the graph. !
Experiments and Results
Experiment results (ARIMA-LSTM)
Stock price
Time
ARIMA
Stock price
Time
LSTM
Stock price
Time
ARIMA-LSTM
Experiments and Results
Experiment results (ARIMA-LSTM)
Table 5. Comparison of the evaluation criteria of ARIMA models.
Models
RMSE
LSTM
2688.831
ARIMA
1487.514
ARIMA-LSTM
878.532
Section 5
Conclusion
Conclusion
In this project, two major models were used in Nasdaq index stock price forecasting. We
proposed a neural network model to predict the daily price of Nasdaq-100 on a short-term
scale, represented the daily stock price of the most representative Nasdaq-listed stocks, and an
improved ARIMA model was proposed to study the stock price forecasting from the Nasdaq
Composite in long-term scale, represented the monthly performance of all stocks listed on the
Nasdaq stock exchange. The attainability of the integrated models on stock price forecasting is
shown by the result, which showed a comparatively satisfying accuracy. !
Moreover, the integrated forecasting model can provide results that reflect the pattern from
dierent perspectives, which helps investors better understand the market trend and guide their
decisions. The experimental analysis indicated that although the neural network models are
broadly used in financial forecasting, the utilization of feature processing with neural networks in
statistical models also showed noticeable potential. It is worth noting that due to the market's
volatility, more economic indicators could be added as the training features for the model to
improve the forecasting ability further. !
Conclusion
Our future work will improve the proposed model's forecasting accuracy by adding dierent
features with long-term and short-term data, such as major stock market indices, gold prices,
and other economic indicators. Furthermore, we plan to closely inspect the short-term
fluctuations and long-term trends of financial data to explore the significance of the correlation
between them. !
Finally, more optimized configurations of the proposed CNN-BiLSTM will be another primary
concern, as parameters should be continually adjusted to achieve a forecast performance.!
Thank you!